Missingness mechanisms and generalizability of patient reported outcome measures in colorectal cancer survivors – assessing the reasonableness of the “missing completely at random” assumption

Background Patient-Reported Outcome Measures (PROM) provide important information, however, missing PROM data threaten the interpretability and generalizability of findings by introducing potential bias. This study aims to provide insight into missingness mechanisms and inform future researchers on generalizability and possible methodological solutions to overcome missing PROM data problems during data collection and statistical analyses. Methods We identified 10,236 colorectal cancer survivors (CRCs) above 18y, diagnosed between 2014 and 2018 through the Danish Clinical Registries. We invited a random 20% (2,097) to participate in a national survey in May 2023. We distributed reminder e-mails at day 10 and day 20, and compared Initial Responders (response day 0–9), Subsequent Responders (response day 10–28) and Non-responders (no response after 28 days) in demographic and cancer-related characteristics and PROM-scores using linear regression. Results Of the 2,097 CRCs, 1,188 responded (57%). Of these, 142 (7%) were excluded leaving 1,955 eligible CRCs. 628 (32%) were categorized as initial responders, 418 (21%) as subsequent responders, and 909 (47%) as non-responders. Differences in demographic and cancer-related characteristics between the three groups were minor and PROM-scores only marginally differed between initial and subsequent responders. Conclusion In this study of long-term colorectal cancer survivors, we showed that initial responders, subsequent responders, and non-responders exhibit comparable demographic and cancer-related characteristics. Among respondents, Patient-Reported Outcome Measures were also similar, indicating generalizability. Assuming Patient-Reported Outcome Measures of subsequent responders represent answers by the non-responders (would they be available), it may be reasonable to judge the missingness mechanism as Missing Completely At Random.


Background
Due to advancements in medical science, increasing numbers of people are living with and after colorectal cancer worldwide [1].The value of new treatments is no longer isolated to how long you survive, but also how well you survive [2,3].Quality of life measures are necessary to include as primary or at least secondary outcomes in interventional research [4][5][6].
Patient Reported Outcomes (PROs) [7,8] have gained attention in healthcare as a way to gather information directly from patients about their health conditions.Patient-Reported Outcome Measures (PROMs) are specific tools used to assess PROs, often through self-completed questionnaires [8].PROMs provide valuable insights into the patient's perspective on treatmentrelated issues, functional abilities, and quality of life [9].However, the nature of PROMs introduces certain challenges related to data collection [10], as they rely on patient willingness and ability to respond, hence the response can be affected by various factors such as illness severity and study design [11].
Missing data is a frequent problem in studies using PROMs [12,13], and is often adressed by conducting complete-case analyses and ignoring the missing data [14].However, this method may lead to biased results, reduced statistical power to detect differences between treatments, limited generalizability and misleading conclusions [13,[15][16][17][18][19].
In studies with missing data (> 5%) possible statistical solutions depend on the missingness mechanisms [16].There are three missingness mechanisms: missing at random (MAR), missing not at random (MNAR) and missing completely at random (MCAR) [15,16,20].If the missingness mechanism and hence the probability of response depend on the observed data then data is said to be MAR.In case of MAR, missing data may be addressed using i.e. multiple imputation to predict the missing values based on observed data.If the probability of response depends on both observed data and missing data then data is said to be MNAR, and may be adressed by conducting sensitivity analyses with best-and-worst case scenarios.If the probability of response is not associated with either the observed or missing values, data is said to be MCAR -and responders are considered representative of non-responders and complete-case analyses may not cause bias, but only enlarged standard errors due to the reduced statistical power.
In a study using PROM scores as an outcome then MCAR means the PROM value may not differ between responders and non-responders.Likewise, responders and non-responders may not differ with regard to other data observed.The MCAR assumption is hard to prove given that the PROMs are actually missing, however using a dataset collected with repeating reminders of the patients to respond may shed light upon the nature of missing PROMs.Comparing responders, who respond following the first invitation (hereafter denoted "Initial responders") to the responders who were actually nonresponders until 2nd or 3rd invitation but replyed subsequently (hereafter denoted "Subsequent responders") creates an opportunity to evaluate the PROMs and characterize the patients who were non-responders at first, and to assess if the MCAR assumption may be reasonable.Comparing registry data between responders and non-responders provides insights into the generalizability of the respondents.
Acknowledging the issues with PROMs, including handling of missing data and related statistical approaches, we aimed to investigate the generalizability of respondents in a random sample of 20% of our registry-based national cohort of > 10,000 long-term colorectal cancer survivors (CRC) before continuing with a large cross-sectional study [21] on prevalence of symptoms indicative of psycho-oncological late effects and quality of life.
We assessed missingness mechanisms of PROM-data among initial responders and subsequent responders.We assumed, that PROM-data from subsequent responders may be indicative of PROM-data from non-responders, had the subsequent responder not been prompted with several reminders, and hence give insight into the missing data of the non-responders.Furthermore, we addressed the generalizability of all responders compared to nonresponders by comparing clinical and demographic characteristics extracted from national registries.The study will provide insight into the missingness mechanisms in the setting of a nationwide cross-sectional study and add to the literature on issues with generalizability and possible methodological solutions to overcome missing data problems during statistical analyses.By addressing this issue associated with PROMs, researchers can better understand how to collect, interpret and utilize PROM data in future studies.

Method
This study complies with STROBE guidelines [22] for reporting observational studies in epidemiology.The data presented in this study is drawn from a population-based cross sectional study of "Late Effects After Colorectal Cancer" conducted among Danish long-term (4-10 years post initial treatment) CRCs, which is also being used for recruitment to a randomized controlled trial (RCT) of an online psychological intervention, see published protocol [21].

Study population
We identified eligible CRCs through the Danish Colorectal Cancer Group (DCCG) [23] database hosted by The Danish Clinical Registries (RKKP) with no need to enter confidential electronic patient records.We invited all Danish CRCs above age 18, able to read and understand Danish, who have completed curative-intent cancer treatment with surgery and/or radiation and/or adjuvant chemotherapy between March 2014 (when the national Danish Colorectal Cancer Screening Program was launched) and December 2018, to participate.

Data collection and definition of responders
Invited participants were a random 20% sample of the total cohort.We invited participants to answer an electronic questionnaire to screen for late effects.We distributed surveys through REDCap using the Danish national-wide secure, personal electronic mail box (e-Boks) between May 8th 2023 and May 12th 2023.In case of no answer on Day 10, the survey was automatically re-distributed.In case of no answer of Day 20, the survey was again re-distributed.Initial responders were defined as responders completing the questionnaire in the first 10 days following the initial invitation.Subsequent responders were defined as responders completing the questionnaire following either first or second reminder (10-28 days after initial invitation).Nonresponders were defined as responders not completing the questionnaire within 4 weeks after last reminder.No incentives were offered.
The e-Boks system offers the option to set up automatic SMS/text message notifications whenever an email is received.Since this feature is personalized, we do not have specific data on its usage.No other phone calls or SMS/text messages were utilized to remind participants to respond.

Patient reported outcome measures (PROMs)
The selected PROMs aim to capture the diversity in psycho-oncological late effects experienced by colorectal cancer survivors.Participants completed an 83-item survey (see Table 1) comprising six questionnaires validated in Danish:

The Fear of Cancer Recurrence Inventory-Short
Form (FCRI-SF) [24,25] measures severity of fear of cancer recurrence on a 9-item subscale with response categories on a 5-point Likert like scale ranging from 0 to 4. Scale scores range from 0 to 36 with higher scores indicating greater severity.

The Fear of Cancer Recurrence -1 revised
(FCR-1r) [26] measures fear of cancer recurrence on a single item ranging from 0 to 10. Higher score indicates greater severity.3. Symptom Checklist-90-R (SCL) [27] subscales [28] measuring anxiety (SCL-anx) in 4 items, depression (SCL-dep) in 6 items and emotional distress (SCLdistress) in 8 items on a 5-point Likert like scale rating from 0 to 4 resulting in simple sum scores ranging from 0 to 16, 24 and 32 respectively.4. The Whiteley-6 [29,30] measures health anxiety on a 5-point Likert like scale with a simple sum score ranging from 0 to 24. 5.The EQ-5D-5 L and EQ VAS scale [31] measures health state in five dimensions: mobility, self-care, usual activities, pain/discomfort and anxiety/ depression on 5 levels resulting in a 5-digit number that describes the patient's health state.The VAS targets health state today and ranges from 0 to 100.EQ-5D-5 L was analysed as sumscore index value and standardized according to the US EQ-5D-5L value set as recommended by EuroQoL [32].6.The BDS Checklist [33] 1. Quality of life was rated using a VAS scale (QoL VAS) inspired by the EuroQoL Health VAS scale, as the word "health" was changed into "Quality of life".Additionally, the survivors were asked to report demographic and cancer-related outcomes supplementing the data drawn from registries.
In case of missing single items within the above mentioned questionnaires, the item was replaced with a 0 indicating a conservative approach, assuming that the symptom in question was not present.

Data cleaning
Returned questionnaires were excluded in case of cancer recurrence or if "dementia", "terminal/too somatic ill" or "dead" were communicated by caregivers to the primary investigator by telephone or email.In certain instances, survivors indicated "no memory of cancer" or "do not want to participate", leading to their exclusion.Similarly, responses lacking consent to participate were excluded.Cases that missed the reversed wording of item 5 on the FCRI-SF and presented with a response pattern of "0" were excluded, as this was interpreted as inattention or lack of motivation [34], see Fig. 1.Only cases with complete demographic data were analyzed, hence cases with missing items were also excluded.

Statistics
We compared demographic and cancer-related group characteristics (i.e., initial vs. subsequent vs. nonresponders) using balance diagnostics and reported as standardized differences accompanied by descriptive statistics.An absolute standardized difference of > 0.1 was interpreted as imbalance between groups [35].Cancerrelated characteristics were compared using Chi-squared test and reported as absolute numbers, proportions and p-values.PROM-scores were analyzed using generalized linear regression models and reported as β-coefficients, 95% CI and p-values for subsequent responders using initial responders as the reference.Assumptions of normality, linearity and homoscedasticity were checked and found to be reasonable.
Analyses were performed on responders with complete PROMs and all non-responders (in total N = 1,955).Statistical analyses were performed in Stata 17 (Stata Corp., College Station, TX, USA).

Results
A total of 10,236 participants were diagnosed with colorectal cancer and treated with curative intend in the period March 2014 to December 2018.Of this cohort, 20% (2,097) were randomly selected for inclusion in this cross-sectional study and emailed a questionnaire between May 8th and May 12th 2023.

Differences in demographic characteristics
We observed minor differences in demographic characteristics.Non-responders tend to be marginally older than initial and subsequent responders (median age 75 vs.73 and 71, standardized difference 0.18 and 0.16, respectively), and consequently a larger proportion of the subsequent responders were employed (27% vs. 23%, standardized difference 0.18), see Table 2.The initial and subsequent responders were similar in terms of education, sex, marital status, language, ethnicity and children.

Differences in cancer-related characteristics
We observed no major differences in cancer-related characteristics between initial-responders, subsequentresponders and non-responders.A marginally higher proportion of both initial and subsequent responders, who were eligible for enrollment in the national colorectal cancer screening program (aged 50-75 at the time of diagnosis) had their cancer detected through screening as opposed to non-responders (45% and 43% compared to 39%).However, no disparity was observed in terms of T-category.N-category differed between groups mainly due to an uneven distribution of missing data, see Table 3.

Differences in PROMs
Concerning the physical and psychological parameters, only the single-item PROMs (the FCR-1r and sexual function) differed between initial responders and subsequent responders, as subsequent responders reported slightly higher FCR-1r (β 0.45, 95% CI(0.13-0.78))and lower sexual dysfunction (β -0.24, 95% CI(-0.41 --0.08)).We observed no difference in quality of life, health state, symptoms of anxiety, health anxiety, depression, or physical symptoms in general, see Table 4.As psychological parameters were measured through PROMs, we do not have data on non-responders.

Discussion
This study analysed generalizability and missingness mechanisms of PROMs in a random sample of long-term colorectal cancer survivors.Based on the present data we report a high grade of comparability between initial responders, subsequent responders and non-responders concerning demographic and cancer-related characteristics.In addition, initial and subsequent responders showed similar outcomes on PROMs, suggesting that the mechanisms of missing data among non-responders in this population can be assumed to be missing completely at random.
In research, high response rates are desirable to enhance validity and generalizability, and several strategies to enhance PRO response rates [10,11,13,[36][37][38][39][40] and deal with missing data have been developed [16,20,41].However, potential bias does not depend on the response rate, but on the degree of similarity between respondents and non-respondents [42].Achieving high response rates increases the likelihood of similarity, but  often requires significant resources for reminders (either postal mail, email or phone call) and may also be burdensome for the participants.
If reminders are to be distributed, the data collecting process can be accelerated, as the response pattern of this population reveals a rapid decline in response rates similar to previous research [39], dropping below 1% within a week.
The findings in this study align with previous research demonstrating limited evidence of non-response bias.In a population-based survey of childhood, adolescent and young adult Norwegian long-term cancer survivors (the NOR-CAYACS study) [43] and in children participating in the Swiss Childhood Cancer survivor study [44] the authors found no evidence of relevant non-response bias in any of the investigated outcomes when comparing observed prevalence in respondents to expected prevalence in a constructed total population.Both studies were performed with postal surveys, and a long period between initial invitation and reminders (up to five months), which do not represent the electronic era of today.
In contrast to our findings, observational data from the Dutch PROFILES study, which included a heterogeneous cancer population, found respondents to be healthier than the population of interest, despite achieving a high response rate of 69% [45].Similarly, in a longitudinal examination of an American breast cancer population using PROMs non-responders tended to be older and more frequently identified as non-English speaking, of Hispanic ethnicity, or of Black race compared to responders.Consequently, the authors concluded that the PROM results may not accurately represent the experiences of the entire American breast cancer population [46].Downing et al. also found nonwhite ethnicity and those living in the most socioeconomically deprived areas less likely to participate in a population-level study, and concluded that the results presented may underestimate the true impact of colorectal cancer on health related quality of life [47].The Danish population of CRCs is relatively homogeneous, and the current study was not designed to specifically investigate the influence of language, socioeconomic status or ethnicity.
While the findings of this study cannot be extrapolated to clinical cancer surveillance in general, understanding the characteristics of patients who do not respond to PROMs is essential especially given the current trend in survivorship care, which emphasizes a patient-centered approach, encouraging patients to self-report symptoms of recurrence, side effects, or late effects as they arise [48].Based on this study, no explicit demographic or cancer-related characterization of this group can be made.

Strengths and limitations
A significant strength of this study lies in the comprehensive dataset available from the DCCG database, which includes valuable background variables and clinical information for all participants.With a database completeness exceeding 95%, we were provided with a unique opportunity to compare responders and non-responders.By recruiting study participants from this registry, we ensured the assembly of a complete and unbiased sample, allowing for a thorough examination of generalizability.
Nonetheless, we acknowledge certain limitations.Given the cross-sectional study design, PROM-data from non-responders are unavailable, and hence PROMresponses from subsequent responders, who were nonresponders until distribution of reminders, may be the closest available.The assumption that subsequent responders may be indicative of non-responders cannot be tested.Multiple PRO scores from different assessment time points would add information regarding the nature of missingness of subsequent time points.Nevertheless, this will still not add information on never-responders.
Whenever missing data is an issue, study dependent missingness mechanisms should be explored and handled accordingly taking the specific research question, outcomes, exposures, covariates etc. into account.Therefore, more general advice on specific imputation strategies cannot be made based on this study.However, this study may help to inform the decision on how to handle missing data in similar populations or study designs.
Another limitation pertains to the methods of data registration, as data on for example lifestyle factors, comorbidity, and performance are recorded by the primary surgeon within 30 days after surgery, potentially introducing recall bias.Another limitation of this study is that the three survey invitations do not have individualized links.Consequently, a person may have responded to the initial invitation on day 11 without a reminder.The generalizability to other cancer diagnoses cannot be determined based on our findings.

Conclusion
In this study of long-term colorectal cancer survivors, we showed that initial responders, subsequent responders, and non-responders exhibit comparable demographic and cancer-related characteristics.Among respondents, PROMs were also similar, indicating generalizability.
Assuming PROMs of subsequent responders represents PROMs of the non-responders (would they be available), as suggested by our analysis, it may be reasonable to judge the missingness mechanism as Missing Completely At Random.Hence, imputation methods may be an option to enhance statistical power.

Fig. 2
Fig. 2 Responses divided on days.Day 0 equals the distribution day.Reminders are send on day 10 and day 20 (vertical lines)

Table 1
Data sources a : Mandatory items

Table 2
Demographic characteristics of the initial responders, subsequent responders and non-responders (N = 1,955) compared using balance diagnostics.Standardized differences > 0.1 indicate imbalance between groups

Initial responders Subsequent responders Non-responders a Standardized difference
The survivors were asked not to respond in case of cancer recurrence, and hence this group also includes survivors who are intentionel non-responders a :b : comparing initial responder to subsequent responders c : comparing initial responders to non-responders

Table 3
Cancer-related characteristics of initial responders, subsequent-responders and non-responders.P-values corresponds to chisquared test Yes" is annotated for individuals who submit a stool sample within three months after receiving screening invitation a : "b : data on TN-stage was not available before January 2016

Table 4
Generalized linear regression models comparing PROM-scores in initial responders (reference group) to subsequent responders (N = 1,046)